feat: migrating to validating admission policies for our validating logic (1/)#708
feat: migrating to validating admission policies for our validating logic (1/)#708michaelawyu (michaelawyu) wants to merge 4 commits into
Conversation
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
|
Please note that for this PR the manager is not enabled in the hub agent yet. To control the PR size, additional tests will be submitted as a separate PR. |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR introduces the first slice of work to move KubeFleet's validation logic from validating webhooks to Kubernetes ValidatingAdmissionPolicies (VAP). It adds a startup-time policy manager that reconciles a configurable set of generated VAPs and their bindings, plus two initial generators (deny pod/replicaset writes outside reserved namespaces, deny serviceaccount/token writes in reserved namespaces). It also exports the namespace prefix constants from pkg/utils.
Changes:
- New
pkg/admissionpolicymanagerpackage: manager that create-or-updates and garbage-collects VAPs labeled by Fleet, with two generators (pods/replicasets, serviceaccounts/tokenrequests). - Integration test suite (
envtest) validating the manager produces the expected VAP/VAPB objects (effects deferred to E2E). - Exports
KubePrefix,FleetPrefix,FleetMemberNamespacePrefixfrompkg/utils/common.gofor reuse in the generator configs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/utils/common.go | Exports previously unexported namespace prefix constants so the admission policy generator configs can reuse them. |
| pkg/admissionpolicymanager/manager.go | Core policy manager: reflection-based generator selection, create-or-update with retry, GC of orphan policies/bindings via label selector. |
| pkg/admissionpolicymanager/configs.go | Top-level configuration struct exposing per-generator config and a default. |
| pkg/admissionpolicymanager/podsnreplicasets.go | Generator producing a VAP that denies CREATE of pods/replicasets outside reserved namespaces. |
| pkg/admissionpolicymanager/svcaccountsntokenreqs.go | Generator producing a VAP that denies serviceaccount CRUD and tokenrequest CREATE in reserved namespaces, with user/group whitelist. |
| pkg/admissionpolicymanager/suite_test.go | Ginkgo/envtest suite bootstrap; starts the policy manager once. |
| pkg/admissionpolicymanager/manager_integration_test.go | Asserts the expected VAP and VAPB objects exist with correct spec/labels. |
| // Exempt whitelisted users from this admission policy. | ||
| for _, username := range g.WhitelistedUsernames { | ||
| celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`request.userInfo.username == "%s"`, username)) | ||
| } | ||
| // Exempt whitelisted user groups from this admission policy. | ||
| for _, userGroup := range g.WhitelistedUserGroups { | ||
| celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`"%s" in request.userInfo.groups`, userGroup)) | ||
| } | ||
| // Exempt requests from the Kubernetes scheduler, any of the nodes, and (esp.) the | ||
| // Kubernetes controller manager from this admission policy. | ||
| // | ||
| // Important: the Kubernetes controller manager, when deployed with the option | ||
| // --use-service-account-credentials=true, creates a service account token for many of its controllers | ||
| // and uses those tokens to authenticate to the Kubernetes API server. It retrieves a token | ||
| // via the TokenRequest API; failure to exempt this scenario may lead to critical errors. | ||
| celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`request.userInfo.username == "%s"`, kubeSchedulerUserName)) | ||
| celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`request.userInfo.username == "%s"`, kubeControllerManagerUserName)) | ||
| celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`"%s" in request.userInfo.groups`, kubeNodeUserGroup)) | ||
| // Exempt requests from admin users from this admission policy. | ||
| celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`"%s" in request.userInfo.groups`, adminUserGroup)) | ||
|
|
||
| celExprAcc := strings.Join(celExprAccSegs, " || ") | ||
|
|
||
| celExprNSSegs := []string{} | ||
| for _, prefix := range g.ReservedNamespacePrefixes { | ||
| celExprNSSegs = append(celExprNSSegs, fmt.Sprintf(`object.metadata.namespace.startsWith("%s")`, prefix)) | ||
| } | ||
| celExprNS := strings.Join(celExprNSSegs, " || ") | ||
|
|
||
| celExpr := fmt.Sprintf("!(%s) || (%s)", celExprNS, celExprAcc) | ||
|
|
There was a problem hiding this comment.
Note: this restriction is added in consistency with our existing webhook setup. system:masters is supposed to have unlimited access to all resources BTW.
| // The error message has been set to match with the checks in some of the E2E tests | ||
| // in our existing release pipeline. | ||
| Message: "creating pods and replicas is disallowed in the fleet hub cluster", |
There was a problem hiding this comment.
N/A: as mentioned, the error msg is explicitly kept in the same style as some of the existing checks for releasing purposes.
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
Signed-off-by: michaelawyu <chenyu1@microsoft.com>
|
|
||
| // AllGenerators returns a copy of all available policy generators. | ||
| func AllGenerators() sets.Set[string] { | ||
| return allGenerators.Clone() |
There was a problem hiding this comment.
Can you get all generator names by reflecting over DefaultPolicyGeneratorConfigs the same way preparePolicyGenerators do?
Keeping them in sync can prevent people adding new ValidatingAdmissionPolicyGenerator but forget to add to one of allGenerators or DefaultPolicyGeneratorConfigs
| ) | ||
|
|
||
| // reservedNamespacePrefixRegexp matches valid namespace prefix characters (DNS label subset). | ||
| var reservedNamespacePrefixRegexp = regexp.MustCompile(`^[a-z0-9-]+$`) |
There was a problem hiding this comment.
This should be in common.go (or another common file that doesn't belong to any specific VAP Generator) because it is used by both podsnreplicasets.go and svcaccountsntokenreqs.go?
| type ValidatingAdmissionPolicyGenerator interface { | ||
| Name() string | ||
| Validate() error | ||
| Policies() []*admissionregistrationv1.ValidatingAdmissionPolicy |
There was a problem hiding this comment.
Instead of independent Policies and PolicyBindings members, can we do
type PolicyWithBindings struct {
Policy admissionregistrationv1.ValidatingAdmissionPolicy
Bindings []admissionregistrationv1.ValidatingAdmissionPolicyBinding
}
type ValidatingAdmissionPolicyGenerator interface {
Name() string
Validate() error
PoliciesWithBindings() []PolicyWithBindings
}?
The concern is that right now the relationship between Policy and Bindings is not explicit and is more error-proned (e.g. someone created a Policy but no Binding; someone deleted a Policy but forgot to delete the corresponding Bindings)
| if policy.Labels == nil { | ||
| policy.Labels = make(map[string]string) | ||
| } | ||
| policy.Labels[VAPManagedByKubeFleetLabelKey] = VAPManagedByKubeFleetLabelValue |
There was a problem hiding this comment.
If you define a map like this on top of file
var AdmissionPolicyManagerLabels = map[string]string{
VAPManagedByKubeFleetLabelKey: VAPManagedByKubeFleetLabelValue,
VAPPartOfKubeFleetLabelKey: VAPPartOfKubeFleetLabelValue,
VAPComponentKubeFleetLabelKey: VAPComponentAdmissionPolicyManagerLabelValue,
}this can be replaced with
maps.Copy(policy.Labels, AdmissionPolicyManagerLabels)can do similar things to 2 other places in this file since these 3 labels and values are always used together for all of AdmissionPolicyManager usages
| if policyBinding.Labels == nil { | ||
| policyBinding.Labels = make(map[string]string) | ||
| } | ||
| policyBinding.Labels[VAPManagedByKubeFleetLabelKey] = VAPManagedByKubeFleetLabelValue |
There was a problem hiding this comment.
If you define a map like this on top of file
var AdmissionPolicyManagerLabels = map[string]string{
VAPManagedByKubeFleetLabelKey: VAPManagedByKubeFleetLabelValue,
VAPPartOfKubeFleetLabelKey: VAPPartOfKubeFleetLabelValue,
VAPComponentKubeFleetLabelKey: VAPComponentAdmissionPolicyManagerLabelValue,
}this can be replaced with
maps.Copy(policyBinding.Labels, AdmissionPolicyManagerLabels)| err := retry.OnError(policyRWOpBackoff, func(err error) bool { | ||
| // Retry on any error expect for context cancellation. Note that nil errors are not passed to this function, | ||
| // and AlreadyExists errors will not occur. | ||
| if ctx.Err() != nil { |
There was a problem hiding this comment.
func retryOnAnyErrorUnlessCancelled(ctx context.Context) func(error) bool {
return func(err error) bool {
return ctx.Err() == nil
}
}and then replace 6 of these usages with
retry.OnError(policyRWOpBackoff, retryOnAnyErrorUnlessCancelled(ctx), func() error {
// ...
})?
| for _, gen := range m.enabledPolicyGenerators { | ||
| // As a sanity check, do one more round of validation. | ||
| // | ||
| // Normally this check would never fail as the generators have been validated before |
There was a problem hiding this comment.
This comment seems to be suggesting that in manager initialization (i.e. New method), the Validate methods were called. But they were not called in New?
| Rule: admissionregistrationv1.Rule{ | ||
| APIGroups: []string{""}, | ||
| APIVersions: []string{"v1"}, | ||
| Resources: []string{"pods"}, |
There was a problem hiding this comment.
We have constants for "pods" and "replicasets" in webhook.go. Shall we start sharing them across webhook package and this new package by pulling them out into a common package? And eventually webhook.go will be deleted in one PR together with other webhook logic? Want to make sure that we still have these as constants after replacing webhook by VAP.
| // Check if any whitelisted username or user group contains characters that are | ||
| // illegal in a CEL string literal (\ and "). | ||
| for _, username := range g.WhitelistedUsernames { | ||
| if strings.ContainsAny(username, `"\`) { |
There was a problem hiding this comment.
These 2 illegal characters are applicable to all CELs, so a similar check needs to be done for many more future VAP generators? Maybe extract into a function so that it is more discoverable for future generators?
// ValidateCELStringLiterals checks that none of the values contain characters
// that would break or inject into a CEL double-quoted string literal.
// The fieldName is used in the error message for context.
func ValidateCELStringLiterals(values []string, fieldName string) error {
for _, v := range values {
if strings.ContainsAny(v, `"\`) {
return errors.NewUserError(nil,
fmt.Sprintf("%s contains characters illegal in a CEL string literal", fieldName),
fieldName, v)
}
}
return nil
}There was a problem hiding this comment.
So these 2 become:
if err := ValidateCELStringLiterals(g.WhitelistedUsernames, "whitelisted username"); err != nil {
return err
}
if err := ValidateCELStringLiterals(g.WhitelistedUserGroups, "whitelisted user group"); err != nil {
return err
}|
|
||
| // Exempt whitelisted users from this admission policy. | ||
| for _, username := range g.WhitelistedUsernames { | ||
| celExprAccSegs = append(celExprAccSegs, fmt.Sprintf(`request.userInfo.username == "%s"`, username)) |
There was a problem hiding this comment.
Have a concern that this CEL construction code is not very readable. Might be difficult for someone to spot a mistake if there is any.
Wondering if it is worth investing in a celbuilder.go like this?
// CELExpr represents a CEL expression fragment.
type CELExpr string
// String returns the CEL expression as a string.
func (e CELExpr) String() string {
return string(e)
}
// UsernameEquals returns a CEL expression matching a specific username.
func UsernameEquals(username string) CELExpr {
return CELExpr(fmt.Sprintf(`request.userInfo.username == "%s"`, username))
}
// InGroup returns a CEL expression matching membership in a group.
func InGroup(group string) CELExpr {
return CELExpr(fmt.Sprintf(`"%s" in request.userInfo.groups`, group))
}
// NamespaceStartsWith returns a CEL expression matching a namespace prefix.
func NamespaceStartsWith(prefix string) CELExpr {
return CELExpr(fmt.Sprintf(`request.namespace.startsWith("%s")`, prefix))
}
// Or joins expressions with ||.
func Or(exprs ...CELExpr) CELExpr {
parts := make([]string, len(exprs))
for i, e := range exprs {
parts[i] = string(e)
}
return CELExpr(strings.Join(parts, " || "))
}
// Not negates an expression.
func Not(expr CELExpr) CELExpr {
return CELExpr(fmt.Sprintf("!(%s)", expr))
}then we can build and use CEL like this:
func (g *SvcAccountsAndTokenRequestsVAPGenerator) Policies() []*admissionregistrationv1.ValidatingAdmissionPolicy {
// Namespace condition
var nsClauses []CELExpr
for _, prefix := range g.ReservedNamespacePrefixes {
nsClauses = append(nsClauses, NamespaceStartsWith(prefix))
}
// User exemptions
var exemptions []CELExpr
for _, u := range g.WhitelistedUsernames {
exemptions = append(exemptions, UsernameEquals(u))
}
for _, grp := range g.WhitelistedUserGroups {
exemptions = append(exemptions, InGroup(grp))
}
exemptions = append(exemptions,
UsernameEquals(kubeSchedulerUserName),
UsernameEquals(kubeControllerManagerUserName),
InGroup(kubeNodeUserGroup),
InGroup(adminUserGroup),
)
// Allow if: not reserved namespace OR user is exempt
celExpr := Or(Not(Or(nsClauses...)), Or(exemptions...))
policy := &admissionregistrationv1.ValidatingAdmissionPolicy{
// ...
Spec: admissionregistrationv1.ValidatingAdmissionPolicySpec{
Validations: []admissionregistrationv1.Validation{
{
Expression: celExpr.String(),
Message: "writing service accounts in reserved namespaces or requesting tokens from such service accounts is disallowed",
Reason: ptr.To(metav1.StatusReasonForbidden),
},
},
},
}
return []*admissionregistrationv1.ValidatingAdmissionPolicy{policy}
}There was a problem hiding this comment.
podsnreplicasets.go can also do similar:
func (g *PodsAndReplicaSetsVAPGenerator) Policies() []*admissionregistrationv1.ValidatingAdmissionPolicy {
var nsClauses []CELExpr
for _, prefix := range g.ReservedNamespacePrefixes {
nsClauses = append(nsClauses, NamespaceStartsWith(prefix))
}
celExpr := Not(Or(nsClauses...))
policy := &admissionregistrationv1.ValidatingAdmissionPolicy{
// ...
Spec: admissionregistrationv1.ValidatingAdmissionPolicySpec{
Validations: []admissionregistrationv1.Validation{
{
Expression: celExpr.String(),
Message: "creating pods and replicas is disallowed in the fleet hub cluster",
Reason: ptr.To(metav1.StatusReasonForbidden),
},
},
},
}
return []*admissionregistrationv1.ValidatingAdmissionPolicy{policy}
}
Description of your changes
This PR is the first of many following PRs to migrate to validating admission policies for applicable validation logic.
Specifically, this PR:
Releated to #707.
I have:
make reviewableto ensure this PR is ready for review.How has this code been tested
Special notes for your reviewer